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Abstract 

This  paper  is  a  survey  of  a  representative  set  of  real-time  operating 
systems,  primarily  designed  to  support  Robotics  control  systems  and 
used  in  Robotics  research  environments.  We  discuss  the  special  re- 
quirements for  operating  systems  in  robotics  control  applications  and 
contrast  these  systems  from  "normal"  time-sharing  systems.  The  main 
features  of  these  real-time  operating  systems  are  studied  with  empha- 
sis on  their  architecture,  the  nature  of  processes,  the  means  of  inter- 
process communication,  and  programming  characteristics. 
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1.    Introduction 


1      Introduction 


This  paper  is  a  survey  of  some  operating  systems  primarily  designed  for 
robot  control  systems.  Of  particular  concern  in  this  survey  are  the  systems 
used  at  the  low  end  of  the  control  hierarchy.  Robotic  devices  are  growing 
in  complexity  both  in  the  degrees  of  freedom  to  be  coordinated  and  in  the 
sensory  input  available.  For  comparison  a  typical  six  degree  of  freedom  arm 
with  position  sensors  for  each  joint  and  the  Utah/MIT  hand  which  has  16 
degrees  of  freedom  with  both  position  and  torque  sensors  for  each  joint. 
The  complexity  of  the  tasks  requested  of  robot  control  systems  has  grown 
accordingly  so  that  the  computing  power  of  contemporary  control  computers 
and  operating  systems  is  being  strained. 

Operating  systems  for  robot  control  systems  fall  within  the  category  of 
real-time  operating  systems.  Perhaps  the  most  salient  feature  of  real-time 
operating  systems  is  preemptive  scheduling  which  means  that  it  is  possible 
for  a  high  priority  task  or  tasks  to  demand  immediate  access  to  the  proces- 
sor so  that  some  real-time  constraint  can  be  met.  Many  real-time  operating 
systems  are  otherwise  normal  operating  systems  with  preemptive  scheduling 
added.  A  characteristic  of  low  level  robot  control,  the  servo  loop,  permits 
further  refinement  of  the  operating  system  to  the  point  that  some  of  sys- 
tems discussed  in  this  survey  bear  little  resemblance  to  normal  operating 
systems.  Servo  loops  demand  repetitive  and  timely  service  and  a  robot  con- 
trol system  is  likely  to  have  many  loops.  Special  scheduling  techniques  can 
be  used  because  of  the  repetitive  nature  of  the  loops.  The  demands  of  timely 
service  (particularly  in  high  frequency  loops  of  low  level  control)  require  a 
low  tolerance  for  blocking  of  tasks  for  indeterminate  lengths  of  time.  As  a 
result  one  finds  that  queues  play  a  diminished  role  in  the  systems  presented 
in  this  survey  when  compared  to  normal  operating  systems. 

The  next  section  discusses  the  environment  in  which  these  operating  sys- 
tems must  work  and  some  general  characteristics  of  the  systems.  Section  3 
summarizes  some  robot  control  operating  systems  with  particular  emphasis 
put  on  the  computational  architecture  used,  the  nature  of  processes  and 
inter-process  communication  and  the  programming  styles  imposed  or  sug- 
gested by  the  systems.  Finally,  section  4  presents  a  few  conclusions  about 
robot  control  systems. 
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2      Robot  Control  Systems 

The  dominant  characteristic  of  robot  control  systems  is  that  the  system  must 
operate  within  real-time  constraints.  In  practice  these  constraints  are  on  the 
order  of  tens  of  milliseconds.  These  constraints  impose  hard  limits  which,  if 
violated,  very  likely  mean  irrecoverable  failure.  For  instance,  the  failure  of 
a  robot  controller  to  respond  in  a  timely  manner  to  a  sensor  may  cause  the 
robot  to  damage  itself  or  crush  the  object  being  manipulated.  This  sort  of 
failure  represents  total  failure  of  the  system.  This  is  in  contrast  to  a  time- 
sharing system  in  which  a  time  out  during  file  transfer  may  cause  the  transfer 
to  be  aborted  but,  in  general,  does  not  mean  the  entire  system  has  failed. 
In  fact,  the  aborted  transfer  often  will  be  restarted  automatically.  Another 
consequence  of  the  fact  that  the  limits  imposed  on  the  system  are  real  is  that 
the  worst  case  performance  of  a  system  component  is  usually  more  important 
than  its  average  or  typical  case  performance.  For  this  reason,  queues,  which 
are  commonly  used  in  time-sharing  systems  to  control  demands  for  scarce 
resources  are  rarely  used  in  real-time  systems.  Some  of  the  systems,  notably 
MUSE  [Sieg85]  [Sieg86]  and  NYMPH  [Chen86],  remove  almost  all  queue 
constructs  while  Harmony  [GENT81][GENT84]uses  queues  but  encourages 
programming  style  that  carefully  controls  their  use. 

Despite  the  high  computing  power  often  found  in  these  systems  it  is 
appropriate  to  consider  them  as  sma// systems.  Although  they  are  typically 
multi-tasking  and  multi-processor  systems  they  are  "single  user"  systems  in 
the  sense  that  the  system  is  dedicated  to  a  single  "job" .  So  just  as  a  user  may 
go  through  phases  of  editing,  compiling,  and  running  a  robot  system  may  go 
through  markedly  different  states.  Consider  a  robot  arm  with  a  force  sensing 
tool  at  the  end.  When  the  tool  is  not  in  contact  with  any  object  the  arm 
may  move  freely  using  only  desired  position  to  control  the  motion,  but  as  the 
tool  approaches  an  object  a  different  control  regime  may  be  used  to  make 
contact  with  the  object  and  then  another  control  regime  may  be  invoked 
to  have  the  tool  perform  the  desired  tasks.  The  entire  system  including 
the  operating  system  may  reflect  these  different  regimes,  different  types  of 
processes  may  be  used,  different  methods  of  inter-process  communication 
may  be  available  and  so  on.  This  is  in  contrast  to  a  time-sharing  operating 
system  which  struggles  to  maintain  the  single  state  of  being  "up"  in  which 
all  of  its  functions  are  available.  Another  attribute  of  smallness  is  that 
robot  control  systems  tend  to  have  a  single  address  space  (or  maybe  one  per 
processor).  This  reduces  overhead  during  context  switches  and  simplifies 
inter-process  communication. 
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Secondary  storage  is  rare,  particularly  at  the  lower  control  levels,  largely 
because  the  response  time  required  of  a  system  is  often  less  than  the  access 
time  for  a  disk  thus  making  demand  paging  or  consulting  a  disk  resident 
data  base  infeasible.  As  a  consequence  of  this,  entire  systems  tend  to  be 
stored  in  the  memory  as  a  single  unit  and  then  executed.  It  is  acceptable  to 
statically  allocate  many  things  that  a  time-sharing  operating  system  would 
have  to  dynamically  aJlocate.  For  instance,  some  of  the  systems  discussed 
here  specify  that  all  processes  be  specified  and  given  identifiers  at  the  sys- 
tem binding  time.  Also  resources  such  as  buffers  may  be  allocated  statically, 
partly  to  leduce  overhead  but  more  importantly  to  reduce  uncertainty  in  the 
time  required  to  obtain  the  resource.  Another  outcome  of  the  lack  of  sec- 
ondary storage  is  the  paucity  of  user  tools.  Program  development  is  almost 
always  done  off-line  and  usually  on  another  machine.  Interactive  capabili- 
ties are  rudimentary  unless  special  purpose  devices  (teach  pendants  etc.)  are 
provided.  Debugging  tools  are  lacking  but  there  are  more  fundamental  rea- 
sons inhibiting  the  development  of  debugging  tools.  Human  response  time  is 
much  longer  than  the  system  cycle  time  which  makes  interactive  debugging 
at  the  lowest  levels  difficult  and  since  the  time  constraints  on  the  system  are 
real  constraints,  single  stepping  through  cycles  is  often  impossible.  Another 
hindrance  to  effective  debugging  tools  is  again  that  time  is  of  the  essence 
and  that  "turning  on  a  debugger"  may  adversely  affect  the  systems  timing 
either  by  making  the  system  too  slow  or  by  masking  or  acerbating  a  race 
condition. 

It  seems  to  be  in  the  nature  of  the  devices  being  controlled  that  they 
require  frequent  and  cyclic  attention.  Thus  the  rather  timeless  command 
"Draw  a  line  with  chcdk."  is  more  precisely  "Draw  a  line  with  chalk  within 
a  few  seconds  and  don't  let  the  chalk  screech."  or  more  appropriately  for 
robot  control  "Maintain  a  constant  force  against  the  wall  while  moving  along 
the  wall  at  a  specified  rate."  which  is  implemented  by  several  processes 
furiously  checking  forces  and  adjusting  trajectories  over  and  over  again. 
This  cyclic  behavior  permeates  control  programs  to  such  an  extent  that  the 
operating  system  can  take  advantage  of  it  to  improve  system  performance. 
AL  [Fink74],  owl  [Donn84],  and  MUSE  [Sieg85]  [Sieg86]  implement 
this  cyclic  behavior  at  a  low  level  and  virtually  eliminate  the  notion  of 
asynchronous  tasks  on  a  single  processor. 

For  the  modern  systems  the  architecture  universally  consists  of  several 
micro-computers  on  a  shared  bus.  Economics  is  clearly  one  reason  for  this 
but  the  modularity  provided  by  this  architecture  is  also  important.  In  a 
single  computer  environment  any  addition  to  a  system,  a  new  device  or  a 
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higher  level  of  control,  degrades  the  performance  of  all  the  other  components 
of  the  system.  For  instance,  an  addition  to  a  high  level  planner  may  starve 
an  otherwise  accurate  joint  servo  process  or  a  critical  safety  procedure.  In 
practice,  with  single  board  computers,  only  the  interaction  with  components 
closely  associated  with  a  new  component  or  feature  need  be  considered. 
Loosely,  one  could  say  that  the  designers'  problem  is  reduced  from  0{n^) 
(where  n  is  the  number  of  "components")  to  0{n)  but  this  is  an  illusion 
brought  on  by  the  low  utilization  of  the  common  resource,  the  shared  bus. 
The  systems  in  use  today  use  a  handful  of  computers  on  a  single  bus  and 
the  interprocessor  communication  is  purposefully  kept  to  a  minimum.  When 
one  considers  future  systems  with  scores  of  computers  and  high  bandwidth 
features  such  as  file  servers  and  data  logging  it  is  no  longer  possible  to  neglect 
the  effect  on  the  bus  of  each  processor  or  system  component. 


Multiprocessor  architectures  are  used  in  all  the  systems  but  they  tend  to 
be  loosely  coupled  systems.  Typically,  processes  are  statically  assigned  to 
processors  and  there  is  no  attempt  dynamically  to  distribute  the  load  among 
the  processors.  As  above,  both  economics  as  simplicity  of  design  account 
for  this  in  today's  systems. 


Another  important  characteristic  of  these  systems  is  that  they  are  open 
systems.  This  means  that  there  is  little  or  no  attempt  to  protect  the  system 
from  the  users  or  to  protect  the  users  from  themselves.  To  a  large  extent 
this  is  due  to  the  fact  that  the  systems  have  been  developed  for  the  research 
environment.  As  mentioned  before  a  single  address  space  is  common  and 
the  distinction  between  system  code  and  user  code  is  blurred  so  typically 
everything  runs  in  supervisor  state.  Another  way  to  state  this  is  that  the 
systems  are  designed  for  "intelligent"  or  "knowledgeable"  users.  Frequently, 
the  group  that  designs  an  operating  system  is  also  the  primary  users  so 
there  is  some  basis  for  the  latter  characteristic.  Another  excuse  for  this 
open  attitude  is  that  the  cost  of  enforcing  protections  is  considered  to  be  too 
high  in  the  real-time  environment.  That  a  user  may  be  occasionally  stymied 
because  they  have  to  staticly  allocate  a  buffer,  for  instance,  is  a  reasonable 
price  for  the  low  cost  of  obtaining  buffers  during  execution.  On  the  other 
hand,  this  openness  can  make  it  difficult  to  enforce  safety  requirements  for 
the  robot  itself  or  people  near  the  robot  when  it  is  operating. 
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3      Example  Operating  Systems 

This  section  presents  some  real  or  planned  systems  for  robot  control.  The 
references  cited  have  range  of  focuses,  from  implementation  of  a  particular 
robot  system,  to  presentation  of  a  robot  programming  language,  to  discus- 
sion of  general  characteristics  of  real-time  systems.  For  this  paper  we  will 
focus  on  three  aspects:  the  computational  architecture  proposed  or  required 
by  the  system  (there  is  little  variation  here),  the  nature  of  tasks  or  pro- 
cesses in  the  system  and  the  means  of  inter-process  communication,  and 
programming  style  implied  or  required  by  the  authors. 

3.1      AL 

AL  [Fink74]  is  a  language  with  embedded  operating  system  for  program- 
ming robots  in  assembly  tasks.  It  was  developed  during  the  mid  1970's, 
before  the  advent  of  micro-computers  so  the  restrictions  on  real-time  com- 
puting power  that  so  profoundly  mark  AL  are  greatly  reduced  today.  How- 
ever AL  is  interesting  because  of  the  tremendous  scope  addressed  by  the 
language  and  the  techniques  used  to  overcome  the  limitations  in  computing 
power. 

AL  was  developed  in  an  environment  consisting  of  two  six  degree-of- 
freedom  robots  directly  controlled  by  a  PDP-11/45  with  a  PDP-10  for  higher 
level  functions,  program  development  and  the  user  interface.  This  config- 
uration of  a  dedicated  control  computer  capable  of  only  the  lowest  control 
and  a  powerful  but  expensive  general  purpose  computer  for  higher  functions 
is  fundamental  to  the  language.  The  fact  that  the  PDP-11  was  capable  of 
only  the  lowest  control  functions  means  that  the  arm  trajectories  have  to 
be  pre-planned  by  the  compiler  which  runs  on  the  PDP-10.  Thus  before  a 
move  command  can  be  compiled  the  compiler  must  know,  at  compile  time, 
the  positions  the  arms  are  expected  to  have  at  run  time!  One  consequence 
of  this  is  that  there  are  no  run  time  subroutines.  A  subroutine  can  ex- 
pect to  be  called  in  various  circumstances  so  the  compiler  when  compiling 
a  subroutine  has  no  expected  run-time  values  for  the  subroutines  variables 
or  parameters.  Libraries  of  routines  are  implemented  but  are  compiled  as 
macro  expansions. 

Since  the  compiler's  planning  world  of  values  (eg.  arm  position)  will 
never  exactly  match  the  actual  run  time  world  the  system  has  facilities  for 
handling  the  discrepancies.  The  run  time  system  is  capable  of  overcoming 
small  differences  between  the  real  and  planned  world.     The  bulk  of  the 
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responsibility  for  handling  the  planning-world/real-world  discrepancies  is 
left  to  the  programmer  and  compiler.  The  programmer  has  two  worlds 
to  work  in  the  world  of  real  values  determined  at  run  time  and  the  world 
of  planning  values  which  are  the  compiler's  estimate  of  the  value  of  the 
variables  at  run  time.  The  programmer  is  able  to  examine  and  manipulate 
the  planning  values  and  to  effect  the  compilation  with  these  planned  values. 
There  are  instances  in  which  the  compiler  is  unable  to  maintain  unambiguous 
planning  values  and  the  programmer  must  assist  the  compiler.  In  fact,  the 
bulk  of  the  AL  manual  is  dedicated  to  describing  a  multitude  of  ways  for 
the  programmer  to  scatter  hints  to  the  compiler  about  what  the  run  time 
situation  is  likely  to  be. 

The  AL  specification  includes  very  high  level  capabilities  such  as  knowl- 
edge about  the  objects  being  manipulated  and  the  ability  to  optimize  the 
sequence  of  tasks  based  on  its  knowledge  of  the  configuration.  Such  capabil- 
ities are  not  integral  to  the  other  systems  presented  here  and  are  assumed  to 
exist  somewhere,  typically  in  a  high  level  host  computer  that  communicates 
with  the  real-time  control  system. 

An  interesting  aspect  of  AL  due  to  its  rigid  separation  of  planning  and 
run  time  environments  is  that  the  programmer  must  make  very  clear  his  or 
her  assumptions  about  the  what  the  real  situation  will  be  at  run  time.  One 
imagines  that  major  parts  of  both  design  and  debugging  efforts  would  be 
spent  in  finding  the  appropriate  set  of  "hints"  to  give  the  compiler  so  that 
the  desired  task  can  be  done  in  an  eflRcient  and  robust  manner.  But  making 
a  clear  and  complete  task  description  is  a  major  part  of  any  programming 
effort  and  AL  provides  a  model  of  good  programming  discipline  for  real-time 
robot  control. 

The  AL  run  time  system  consists  of  five  types  of  processes  distinguished 
by  function  and  priority.  Li  order  of  their  priority  (high  to  low)  they  are: 

•  Joint  servos  and  encoder  input 

•  Clock,  calendar  (scheduler) 

•  Condition  monitors 

•  Servo  predictors 

•  Interpreters 

AL  divides  time  into  equal  length  slots,  1ms  in  length.  In  one  slot  at 
most  one  of  the  joint  servo  or  encoder  sensing  processes  is  scheduled  this 
process  is  given  the  first  opportunity  during  the  slot.  Lower  level  processes 
then  compete  for  the  remainder  of  the  slot. 
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Each  joint  servo  process  is  paired  with  a  lower  priority  servo  predictor 
process.  When  started  a  joint  servo  process  drives  a  single  actuator  and 
then  hands  control  to  the  corresponding  predictor  process.  The  predictor 
first  schedules  the  next  execution  of  the  servo  process  by  consulting  the 
calendar.  Now  knowing  the  time  the  servo  process  will  next  run  and  the 
current  position  and  velocity  of  the  joint  the  predictor  uses  the  trajectory 
polynomial  to  predict  the  state  of  the  joint  when  the  servo  process  is  next 
scheduled,  thus  it  can  determine  the  action  to  be  taken  by  the  servo  process. 
When  the  reserved  slot  arrives  the  servo  process  uses  the  planned  values, 
with  slight  modification  based  on  more  recent  information,  to  drive  the 
actuator.  When  a  motion  is  complete  or  aborted  the  servo  process  and 
predictor  process  remove  themselves  from  the  system. 

Condition  monitors  are  processes  that  monitor  various  parts  of  the  sys- 
tem, for  example  the  length  of  time  taken  by  a  move,  sensed  forces  or  touch 
pad  sensors.  When  a  condition  monitor  is  tripped  it  initiates  actions  spec- 
ified by  the  compiled  code.  The  action  may  be  critical  in  which  case  it  is 
immediately  executed  or  it  may  simply  be  scheduled  along  with  other  pro- 
cesses. The  condition  monitors  are  of  two  varieties,  hardware  and  software. 
The  hardware  monitors  are  simply  hardware  interrupt  procedures  for  the 
appropriate  equipment.  The  software  monitors  are  routines  that  are  exe- 
cuted at  regular  intervals.  AL  integrates  the  software  priority  mechanism 
with  the  PDP-lTs  hardware  priority  mechanism.  Thus  the  hardware  in- 
terrupts that  are  part  of  the  condition  monitors  can  not  interrupt  higher 
priority  processes  (eg.  servo  processes)  because  the  servo  processes  run  at 
higher  hardware  priority. 

Interpreters  are  the  lowest  level  processes.  These  provide  the  higher 
level  run  time  control.  They  start  moves,  enable  and  disable  monitors, 
perform  calculations  and  spawn  new  processes.  The  interpreted  processes 
don't  execute  PDP-11  code  directly  but  interpret  code  for  a  virtual  stack 
machine.  Presumably,  the  reason  for  this  is  that  it  is  easy  to  implement 
re-entrant  procedures  this  way,  the  context  for  processes  can  be  controlled 
to  minimize  context  switch  times,  and  code  generation  for  the  compiler  is 
simplified. 

As  mentioned  previously,  all  trajectories  are  planned  by  the  compiler 
based  on  assumed  values  for  the  various  arm  variables.  At  run  time  the 
real  values  will  not,  in  general,  match  the  values  assumed  by  the  compiler. 
Before  a  move  is  started  the  real  state  of  the  system  is  compared  to  the  as- 
sumed state,  if  the  difference  is  small  then  small  adjustments  to  the  planned 
trajectory  can  be  made  that  preserve  the  desired  characteristics  of  the  move. 
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K,  on  the  other  hand,  the  difference  is  large  then  a  more  brute  force  method 
is  used  to  bring  the  robot  arm  to  the  desired  configuration. 

Lazy  evaluation  is  used  on  the  run  time  variables.  Each  variable  has  two 
cells  associated  with  it,  a  value  cell  and  a  node  cell.  The  value  cell  holds  the 
variables  current  value  (eg.  a  scalar,  a  vector,  a  plane,  or  a  frame).  The  node 
cell  holds  information  about  the  freshness  of  the  value  and  how  to  calculate 
a  fresh  value,  if  needed.  This  calculations  may  require  that  other  variables 
to  be  updated  and  so  on  until  the  required  fresh  value  available.  The  node 
cell  points  to  list  of  procedures  that  will  bring  the  variable  up  to  date.  The 
node  cell  also  points  to  a  list  of  variables  that  depend  on  this  variable,  that 
is  variables  that  must  be  marked  invalid  if  this  variable  becomes  invalid. 
These  nodes  thus  form  a  graph  that  represents  the  state  of  the  run  time 
variables. 


3.2      NRTX 

The  New  Real-Time  Executive  (NRTX)  [Kapi84]  is  a  real-time  control  sys- 
tem developed  at  Bell  Laboratories.  The  strategy  of  the  developers  was  to 
take  an  existing  time-sharing  system,  UNIX,  remove  unnecessary  and  time 
consuming  features  (eg.  file  system,  multi-user  support)  and  add  real-time 
support  features  (eg.  improved  interprocess  communication,  preemptive 
scheduling).  The  appeal  is  that  the  programmer  has  a  familiar  and  pre- 
sumably comfortable  system  interface.  Also  its  possible  to  take  advantage 
of  existing  libraries  and  tools  in  some  cases. 

NRTX  runs  on  68000  based  computers  with  a  VAX  or  Sun  host  connected 
via  Ethernet.  A  control  system  might  have  several  computers  sharing  a  bus 
or  connected  via  a  parallel  port  but  there  is  no  built-in  support  for  inter- 
processor  communication.  Support  is  provided  for  invoking  routines  on  the 
host  so  that  the  hosts  file  system  is  available  to  the  control  computer.  This 
allows  for  a  fairly  tight  coupling  between  the  host  and  the  control  computer. 

Processes  in  NRTX  are  regular  UNIX  processes  but  with  a  single  ad- 
dress space  (in  UNIX  a  spawned  process  gets  a  private  copy  of  the  parent's 
variables).  Four  methods  of  interprocess  communication  are  supported. 

•  Shared  memory  in  which  processes  simply  refer  to  the  same  location. 

•  Signals  which  are  a  traditional  UNIX  form  of  inter-process  signaling. 
The  model  for  signals  is  hardware  interrupts.  A  process  sets  up  vec- 
tors of  signal  handlers  (much  like  hardware  interrupt  vectors)  and  then 
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proceeds  with  normal  processing.  When  a  signal  is  received  from  an- 
other process  the  appropriate  signal  handler  routine  is  invoked  as  an 
asynchronous  subroutine  call. 

•  Traditional  semaphores  are  adso  avciilable.  Cooperating  processes  use 
one  of  a  pool  of  semaphores  and  the  shared  resource  can  be  claimed 
and  released  using  P  and  V  operations. 

•  Processes  are  also  able  to  pass  messages.  Messages  are  fixed  size  con- 
sisting of  a  message  type,  message  length  and  a  pointer  to  a  buffer 
or  structure,  ilach  process  has  a  unique  queue  associated  with  it  for 
messages.  Usually,  transmitting  a  message  is  quite  rapid  since  it  is 
merely  a  matter  of  copying  the  short  message  and  appending  it  to  the 
receiver's  queue.  There  are  restrictions,  however,  on  the  number  of 
messages  any  one  processing  can  have  queued  and  on  the  total  num- 
ber of  messages  in  the  system  so  it  is  possible  that  a  message  can 
not  be  immediately  transmitted.  In  this  case  the  sending  process  has 
the  option  of  blocking  until  the  message  is  successfully  transmitted 
or  returning  immediately  and  canceling  the  request  to  send.  When 
receiving  a  message  a  process  specifies  a  range  of  message  types  it 
will  accept  at  this  time  and  can  optionally  wait  until  an  appropriate 
message  is  received. 

NRTX  essentially  makes  no  statement  about  how  real-time  control  pro- 
grams should  be  written.  The  UNIX  environment  and  model  are  presented 
to  the  programmer  with  some  enhancements  and  the  programmer  is  free  to 
choose  the  appropriate  style  and  tools. 

3.3      Harmony 

Harmony  [Gent84]  [GentSI]  [Bootly]  is  a  real-time  system  developed  at 
the  University  of  Waterloo.  It  is  a  derivative  of  Thoth  [Cher82]  a  general 
purpose  operating  system.  The  author,  W.  M.  Gentleman,  proposes  along 
with  Harmony  a  very  stylized  way  to  program  real-time  systems  called  the 
administrator  concept  in  the  Message  Passing  [GentSI]  paper. 

Harmony  is  designed  for  multiple  processors  on  a  shared  bus.  It  has 
been  implemented  on  Motorola  68000's  and  National  Semiconductor  32016's. 
Multiprocessor  support  is  static  in  that  each  process  is  bound  to  a  particular 
processor  when  the  system  is  loaded  but  the  communication  primitives  be- 
have the  same  way,  independent  of  whether  the  two  processes  are  on  the  same 
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processor  or  on  different  processors.  The  communication  medium  (shared 
memory,  parallel  port,  Ethernet,  etc.)  does  not  change  the  surface  charac- 
teristics of  the  communication  adthough  the  time  needed  for  communication 
may  vary. 

The  paradigm  used  for  interprocess  communication  is  message  passing. 
Four  functions  are  used  to  implement  the  communication  each  function  re- 
turns a  process  identifier  or  error  indicator. 

.Send (request,  reply,  id)  Request  points  to  a  variable  length  message  to  be 
passed  to  the  process  indicated  by  id  (the  length  is  in  the  first 
two  bytes  of  the  message).  Reply  points  to  a  buffer  to  receive 
the  reply  from  the  receiving  process  (again  variable  length). 
The  sending  process  blocks  until  the  receiving  process  explicitly 
replys  to  the  request. 

.Receive (request,  id)  If  the  id  is  non-zero  the  receiving  process  blocks  until 
there  is  a  message  from  the  indicated  process,  if  id  is  zero  any 
message  will  be  received  and  the  process  will  block  only  if  no 
messages  are  pending.  The  incoming  message  is  copied  into  the 
area  pointed  to  by  request,  if  the  incoming  message  is  too  long  it 
is  truncated.  The  identifier  of  the  sending  process  is  returned. 

.Try.receive (request,  id)  This  is  the  same  as  -Receive  except  that  the  pro- 
cess never  blocks.  If  no  message  is  available  then  a  zero  value 
is  returned. 

.Reply(reply,  id)  The  process  indicated  by  id  must  be  blocked  waiting  for  a 
reply  from  this  process  (i.e.  it  must  have  performed  a  _Send). 
If  so  the  message  pointed  to  by  reply  is  copied  into  the  reply 
area  specified  by  the  sending  process. 

Three  points  are  important  here,  one  is  where  processes  block.  The 
sender  blocks  until  the  receiver  explicitly  replies.  The  receiver  on  the  other 
hand  does  not  block  during  a  -Reply  since  the  sender  has  provided  a  buffer. 
In  fact  the  receiver  only  blocks  when  there  are  no  messages  (i.e.  when  there 
is  nothing  to  do,  as  we  shall  see).  Second,  replys  need  not  be  made  in 
any  particular  order.  Third,  all  buffering  is  handled  by  the  processes.  This 
eliminates  the  overhead  of  system  buffer  management  and  more  importantly 
means  that  there  is  no  hidden  blocking  while  waiting  for  a  buffer.  Note  that 
if  the  processes  are  in  different  address  spaces  some  sort  of  system  buffering 
most  likely  takes  place.     Gentleman  mentions  the  problem  but  does  not 
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address  it,  however  one  can  imagine  a  scheme  in  which  the  request  message 
is  not  copied  into  the  receiver's  address  space  until  the  receiver  is  ready  for 
it.  This  is  as  reliable  as  the  underlying  medium.  Note  also  that  the  receiver 
will  block  during  the  communication  delays  but  what's  important  is  that 
this  blocking  will  not  lead  to  deadlock,  if  the  medium  is  reliable. 

The  administrator  concept  propounded  by  Gentleman  is  similar  to  the 
common  practice  of  server  processes.  For  each  of  the  various  devices  (and 
other  resources)  in  a  system  a  process  is  designated  to  manage  the  process. 
This  process  is  a  surrogate  for  the  device  in  the  system.  Requests  of  the 
device  are  made  to  the  process  and  information  about  the  deuce  is  obtained 
from  the  process.  Ideally  the  process  should  closely  mimic  the  state  of  the 
device  and  in  particular  it  is  undesirable  for  the  server  process  to  block  while 
the  device  is  "active".  Using  Harmony  it  is  possible  to  ensure  that  a  server 
process  never  blocks  (except  when  there  is  nothing  to  do)  as  long  as  it  never 
sends  messages.  How  is  this  done?  First  of  all,  processes  with  requests  of  the 
device  send  to  the  server  process  which  handles  the  request  and  replys  to  the 
sender  which  does  not  require  blocking.  Second,  to  perform  tasks  that  may 
require  blocking  (eg.  requests  to  other  devices,  waiting  for  an  interrupt)  the 
server  employs  other  processes  (caJled  worker  processes  by  Gentleman)  to 
perform  these  tasks.  But  since  the  server  can  not  send  messages  it  must 
communicate  with  these  worker  processes  by  using  .Reply.  The  conven- 
tion used  is  that  the  worker  processes  send  message  to  the  server  saying 
essentially  "I'm  available"  and  waits.  When  the  administrator  requires  a 
service  of  a  worker  task  it  replys  to  this  message  with  a  description  of  the 
desired  action.  This  is  not  foolproof.  For  instance,  the  problem  of  what 
happens  when  all  the  workers  are  busy,  is  a  sticky  problem  but  following 
this  paradigm  would  eliminate  many  common  deadlock  situations. 

3.4      OWL 

In  his  thesis  [Don N 84]  Marc  Donner  pursues  two  themes  one  is  the  decom- 
position of  walking  (particularly  insect  walking)  into  loosely  coupled  simple 
sub-tasks  the  other  is  the  real-time  programming  language  OWL  which  is 
well  suited  for  specifying  this  sort  of  decomposition.  To  demonstrate  the 
effectiveness  of  OWL,  Donner,  wrote  a  program  to  control  the  SSA  (Suther- 
land, Sproull,  and  Associates)  walking  machine.  The  SSA  walking  machine 
is  a  six  legged  machine  large  enough  for  a  human  operator  to  ride  upon. 
The  machine  is  controlled  by  a  pair  of  Motorola  68000's  with  shared  mem- 
ory connected  to  a  VAX  780  via  a  pair  of  RS232  lines. 
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OWL  is  designed  to  control  the  walking  of  an  insect  like  mechanism.  As 
in  insects,  control  is  distributed  and  loosely  coupled.  Further,  since  only 
static  stability  is  desired  the  real-time  constraints  are  not  as  oppressive  as 
they  would  be  for  the  dynamic  stability  of  a  hopping  or  running  machine. 

The  walking  program  consists  of  6  main  processes,  one  for  each  leg.  Each 
leg  goes  through  a  simple  cycle  controlled  by  a  finite  state  machine.  The 
processes  (legs)  communicate  only  with  their  nearest  neighbors.  Two  addi- 
tional processes  are  responsible  for  inducing  the  proper  gait  (although  this 
is  not  strictly  necessary  for  walking).  Utility  processes  monitor  the  sen- 
sors, enforce  mechanical  constraints,  log  data  for  later  analysis  and  perform 
similar  functions. 

In  OWL  the  primitive  executable  unit  is  the  process.  Every  statement  is 
a  process.  In  the  run  time  system  processes  are  cheap  so  the  overhead  is  not 
great.  Donner  estimates  that  as  implemented  the  cost  of  starting  a  process 
is  about  8  times  the  cost  of  invoking  a  C  function.  A  process  interacts  with 
other  system  components  in  four  ways: 

•  It  can  become  active,  i.e.  compete  for  the  processor. 

•  It  can  terminate,  which  may  allow  other  processes  to  become  active. 

•  It  can  assert  an  alert  signal.  This  will  cause  certain  other  processes 
to  terminate.  This,  like  a  bee's  stinger,  can  only  be  used  once  (except 
that  the  stinging  process  doesn't  necessarily  die).  This  is  the  only 
system  supported  means  of  inter-process  communication. 

•  It  can  cause  side  effects.  That  is  change  global  variables,  read  sensors, 
or  effect  actuators.  In  the  walking  program  this  is  the  method  used 
for  a  leg  to  communicate  with  its  neighbors. 

Processes  can  be  combined  in  two  ways  into  more  complex  processes.  A 
sequence  is  a  list  of  processes  that  are  performed  in  the  order  in  which  they 
are  coded,  each  waiting  until  the  previous  has  completed  before  starting. 
Each  sequence  is  a  loop  which  is  terminated  either  by  an  explicit  call  to  a 
termination  process  or  by  an  alert  signal  from  another  process.  The  other 
way  of  combining  processes  is  a  concurrence.  A  concurrence  is  a  list  of 
processes  performed  concurrently.  The  concurrence  terminates  when  all  its 
sub-processes  have  terminated.  When  a  process  asserts  its  alert  signal  all 
other  processes  in  the  same  concurrence  are  terminated.  Thus  a  process  can 
only  signal  processes  included  at  compile  time  in  the  same  concurrence. 
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OWL  can  only  exist  in  an  environment  in  which  processes  are  very  cheap. 
For  instance,  no  case  statement  exists  but  the  effect  is  achieved  by  a  concur- 
rence of  processes,  each  representing  a  branch  of  the  case  with  each  inappro- 
priate process  (branch)  immediately  terminating  itself.  Although  conceptu- 
ally OWL  works  this  way  a  smart  compiler  could  improve  the  resulting  code 
by  actually  implementing  a  case  statement  when  the  semantics  permit. 

OWL  has  much  of  the  appeal  of  LISP.  It  has  as  its  core  simple  state- 
ments/processes that  are  recursively  combined  as  sequences  and  concur- 
rences into  more  complex  structures.  It  nicely  describes  the  walking  pro- 
gram but  again  it  seems  difficult  to  specify  tasks  that  are  not  so  simply 
repetitive  and  whose  subtasks  are  highly  interactive. 

There  are  four  options  for  passing  parameters  to  processes: 

1.  Pass  by  value.  A  copy  of  the  parameter  is  passed  to  invoked  process. 

2.  Value-result.  A  copy  of  the  parameter  is  passed  to  the  process  and 
upon  termination  the  new  value  parameter  is  passed  back. 

3.  Valstar.  A  copy  of  the  parameter  is  passed  to  the  process  each  time 
the  process  becomes  active  (i.e.  obtains  the  processor).  That  is  the 
process  always  sees  the  freshest  value  of  the  parameter.  Thus  a  servo 
process  can  be  passed  the  target  and  actual  positions  and  each  time 
it  is  invoked  the  most  recent  values  will  be  available. 

4.  Varstar.  It  is  like  valstar  in  that  the  the  process  always  has  the  fresh- 
est value  of  the  parameter  and  in  addition  whenever  the  process  is 
suspended  or  terminated  the  value  is  returned  to  the  source. 

OWL  is  presented  as  part  of  a  project  so  some  experimental  results 
are  available.  The  SSA  machine  walked  successfully  under  the  Donner's 
walking  program  in  OWL.  It  could  take  a  step  once  every  10  -  15  seconds. 
The  mechanics  of  the  walker  are  primarily  responsible  for  the  slowness.  The 
program  is  robust  enough  so  that  "snipping  off  a  leg"  does  not  stop  the 
walking.  This  is  more  a  result  of  the  task  being  performed  and  the  way  it 
was  decomposed  than  intrinsic  characteristics  of  OWL. 

3.5      IBM  General-Purpose  Control  Architecture 

IBM  General-Purpose  Control  Architecture  [Tayl86]  is  a  proposed  architec- 
ture (hardware  and  software)  for  real-time  control.  A  high  level  system  (or 
Programming  System)  is  connected  via  a  real-time  bridge  (shared  memory) 
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to  the  Real-Time  System.  The  high  level  system  is  programmed  in  variant  of 
AML  [IBM  ]  a  high  level  language  for  manufacturing  control.  The  real-time 
system  has  a  supervisory  processor  and  multiple  auxiliary  processors  on  a 
common  bus.  The  high  level  system  issues  verbs  to  the  real-time  system. 
Verbs  are  data-flow  graphs  composed  of  lower  level  verbs  to  be  executed  on 
the  real-time  system.  The  problem  of  scheduling  on  the  reaJ-time  system 
is  reduced  by  translating  verbs  into  action  sequences  which  are  sequential 
program  segments  that  are  to  be  executed  on  a  single  processor  without 
interaction. 

3.6  GEM 

GEM  [ScHw85]  is  an  operating  system  used  for  six-legged  waJker  developed 
at  Ohio  State  University.  The  hardware  system  consists  of  16  Intel  8086's  on 
two  Multibus's.  GEM  has  two  types  of  processes,  normal  processes  called 
processes  and  faster  processes  called  micro-processes.  Micro-processes  be- 
long to  a  process  and  are  like  co-routines  within  the  process.  Micro  pro- 
cesses are  limited  in  scope.  A  typical  micro-process  examines  some  input 
ports,  calculates  and  writes  to  output  ports  which  in  turn  are  inputs  to  some 
other  process  or  micro-process.  In  this  manner,  micro-processes  are  used  to 
implement  servo-loops. 

GEM  implements  three  types  of  inter-process  communication.  In  the 
first  type  (asynchronous  execution  with  data  loss)  the  receiving  process  as- 
sumes data  is  always  available  and  if  nothing  new  has  arrived  the  previous 
value  is  used.  This  is  appropriate  for  servo  processes.  The  second  type  is 
the  normal  message  passing  i.e.  every  message  sent  is  received  and  receiver 
will  block  waiting  for  a  message.  The  third  type  is  a  hybrid  in  which  data 
may  be  lost  if  it  becomes  too  old.  This  is  used,  for  instance,  in  data  logging 
in  which  lost  data  is  preferable  to  clogging  up  the  system.  A  single  com- 
munication mechanism  is  used  to  implement  all  three  communication  types. 
This  mechanism  is  an  inter-processor  mailbox  with  a  message  pool. 

3.7  MUSE 

MUSE  [Sieg85]  [Sieg86]  [Nara86]  is  a  system  developed  to  control  the 
Utah/MIT  hand.  This  hand  is  an  anthropomorphic  hand  consisting  of  four 
fingers  and  four  joints  per  finger.  With  16  degrees  of  freedom,  16  position 
sensors,  and  16  torque  sensors  the  hand  is  more  complex  than  the  other 
devices  discussed  in  this  paper. 
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The  system  runs  on  multiple  Motorola  68020  computers  on  a  shared  bus. 
The  bus  is  also  connected  to  a  Sun3  which  is  used  for  program  development, 
file  service,  high  level  control,  and  debugging.  An  early  version  of  the  system 
had  9  Motorola  68000  processors  on  the  bus,  currently  they  use  4  of  the  more 
powerful  68020  processors.  All  of  the  memory  in  aU  the  processors  is  dual- 
ported  and  visible  to  all  the  the  other  processors.  This  greatly  simplifies 
inter-processor  communication  since  it  is  often  sufficient  to  pass  a  pointer 
to  the  message. 

MUSE  does  not  support  processes.  The  basic  unit  of  execution  is  the 
routine.  A  routine  can  be  performed  asynchronously,  usually  at  the  request 
from  an  external  processor.  The  invoking  procedure  may  or  may  not  wait 
for  the  invoked  procedure  to  complete.  Generally  though  procedures  are 
scheduled  for  repetitive  execution.  Thus,  the  flavor  of  MUSE  is  of  multiple 
servo  loops  which  are,  from  the  systems  point  of  view,  independent  of  each 
other.  Each  servo  loop  consists  of  a  servo  rate  and  a  procedure  to  be  invoked. 
For  a  loop  to  run  at  50Hz,  the  scheduler  promises  that  the  procedure  will 
be  executed  once  every  20ms  but  there  is  no  guarantee  that  the  procedure 
will  be  be  scheduled  at  exactly  20ms  intervals.  In  this  example,  the  interval 
between  invocations  could  be  almost  40ms  in  one  instance  and  nearly  zero 
the  next.  The  priority  for  execution  is  determined  by  the  servo  rate,  thus 
higher  frequency  loops  are  given  preference  and  can  preempt  slower  servos. 
Because  the  servo  loops  are  not  coroutines  they  can  use  the  same  stack,  in 
fact,  only  one  stack  is  used  for  each  processor. 

The  servo  loop  scheduler  or  SLS  maintains  a  list  of  servos  in  a  process 
table  in  priority  order.  To  find  a  servo  procedure  to  invoke  the  SLS  scans 
the  process  table  and  invokes  the  first  marked  procedure.  A  procedure  is 
scheduled  by  marking  its  entry  in  the  table.  Thus  in  the  above  example 
there  is  a  time  driven  procedure  that  every  20ms  marks  the  entry  for  the 
servo's  procedure  and  then  the  SLS  will  in  a  timely  manner  discover  the 
mark  an  invoke  the  procedure.  One  result  of  this  is  that  it  is  easy  to  detect 
when  a  processor  is  over-burdened.  If  the  scheduler  attempts  to  mark  a 
servo  procedure  that  is  already  marked  that  means  the  procedure  was  not 
invoked  during  the  previous  cycle  and  the  loops  have  overrun  the  system. 
MUSE,  in  fact,  allows  a  small  amount  of  overrun  so  that  special  events  which 
require  extra  processing  can  be  accommodated. 

The  primary  tool  for  communication  between  the  loops  is  messages. 
When  a  message  is  sent  it  is  made  available  to  the  recipient  processor  and 
then  the  processor  is  interrupted  to  handle  the  message.  Each  message  in- 
cludes an  index  into  a  list  of  procedures,  called  virtual  device  drivers  or 
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VDD,  on  the  receiving  processor.  When  it  is  interrupted  the  receiving  pro- 
cessor performs  the  VDD  with  the  remainder  of  the  message  as  input  data. 
A  message  may  contain  the  index  to  a  VDD  on  the  sending  machine  which  is 
to  be  invoked  as  the  reply  to  the  message.  MUSE  includes  libraries  of  VDD's 
for  basic  system  functions  such  as  starting/stopping  servo  loops,  changing 
their  servo  rates  and  so  on. 

3.8      NYMPH 

NYMPH  [Chen 86]  is  a  control  system  developed  at  Stanford  and  used  to 
control  the  Stanford/JPL  hand.  The  hand  has  three  fingers  with  four  ten- 
dons per  each  finger  and  a  tension  sensor  on  each  tendon. 

The  Nymph  architecture  consists  of  a  Sun2  computer  with  multiple  Na- 
tional Semiconductor  32016  processors  on  the  Sun's  bus.  The  Sun  runs  the 
V  [Cher83]  operating  system. 

NYMPH  provides  no  operating  system  for  the  32016  processors  instead 
two  libraries  are  provided.  One  library  contains  routines  for  message  passing 
between  the  the  32016's  and  the  V  system.  The  V  system  act  strictly  as  a 
server  it  cannot  initiate  communication  with  the  client  machines.  A  client 
machine  forms  a  message,  passes  it  to  the  server  and  then  interrupts  the 
server  which  examines  the  message  and  invokes  the  appropriate  handler.  The 
client  processor  waits  in  a  busy  wait  loop  for  the  message  to  be  returned. 
The  second  library  provides  synchronization  primitives  synch_signal(n) 
and  synch_wait(n,  patience).  To  participate  in  a  synchronized  event  a 
processor  calls  synch_signal(n)  where  n  indicates  the  event.  The  processor 
is  then  obligated  to  later  perform  synch_wait(n,  patience)  on  the  same 
event.  Typically,  when  synch_wait  is  called  the  processor  blocks  until  all 
processors  participating  in  the  event  have  invoked  synch.wait.  The  second 
argument,  patience,  modifies  the  blocking.  If  patience  is  zero  the  processor 
does  not  block,  if  it  is  greater  than  zero  the  processor  will  block  but  will 
timeout  after  an  interval  proportional  to  patience.  If  patience  is  less  than 
zero  the  processor  blocks  until  synchronization  is  complete. 

NYMPH  is  truly  minimal  providing  almost  none  of  the  "normal"  operat- 
ing system  services  are  provided.  The  V  system  loads  "raw"  programs  into 
the  processors  and  turns  them  on  with  only  the  slim  connections  of  messages 
and  synchronization.  There  is  no  support  to  help  the  user  to  distribute  the 
processing  load  among  the  processors.  The  authors  hint  that  a  multi-tasking 
operating  system  could  be  put  on  top  of  NYMPH  and  one  suspects  that  any 
significant  development  effort  would  require  the  implementation  of  a  more 
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complete  operating  system  on  top  of  NYMPH. 

4      Conclusions 

Two  trends  are  discernible  in  the  systems  presented  here.  Two  of  the  sys- 
tems, NRTX  and  Harmony,  provide  a  standard  multitasking  environment. 
The  systems  are  modified  for  real-time  use  but  each  task  is  presented  a  vir- 
tual machine  and  operates  asynchronously.  The  remaining  systems  discard 
conventional  features  and  impose  constraints  as  needed  to  enhance  the  real- 
time response  of  the  system.  NYMPH  is  the  most  extreme  example  of  what 
might  be  cailled  "veneer  operating  systems"  that  provide  only  a  thin  layer 
between  the  user  and  the  raw  machine  (MUSE  also  falls  in  this  class).  In 
part,  this  latter  trend  is  driven  by  definitive  features  of  robot  control  most 
prominently  the  servo  loop.  But  more  important  is  the  attempt  to  deliver  as 
much  of  the  raw  power  of  the  underlying  architecture  to  the  control  of  the 
robot  as  possible.  Just  as  time-sharing  systems  have  provided  more  services 
to  the  users  as  the  hardware  technology  has  provided  more  computing  power 
it  is  reasonable  to  expect  that  robot  control  system  will  grow  in  sophistica- 
tion with  the  underlying  technology.  But  robotics  is  a  young  field  and  the 
complexity  of  the  devices  to  be  controlled  is  growing  rapidly  so  the  need  for 
minimal  operating  systems  will  persist,  at  least  for  the  near  future. 
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